-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graph,graphql,server,store: Subgraph Sql Service #5382
Conversation
Wow .. that's a really nice addition! Before I start reviewing this in detail, I noticed a few things from a quick look at the PR:
|
064e937
to
cc50425
Compare
4ae5f4a
to
082ff49
Compare
Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai>
Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai> executor: impelement methods to execute arbitary sql in query store and deployment store
Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai>
Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai> create store/postgres/src/sql validator and formater: full refactor Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai> refactor: move sql to store/postgres/src/sql Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai> parser: test for array of byte fixed Signed-off-by: Tümay Tuzcu <tumay@semiotic.ai> parser: block_range and block$ added as available columns for tables, sequence functions moved to black list Signed-off-by: Tümay Tuzcu <tumay@semiotic.ai> parser: use latest block as cte filter parser: fix escape columns and tables Fixes STU-217 Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai>
…able_type Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai>
Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai>
Signed-off-by: Gustavo Inacio <gustavo@semiotic.ai> docs: add sql service docs section
… prelude CTE Fixes SQL-266 Signed-off-by: Tümay Tuzcu <tumay@semiotic.ai>
e577010
to
a1a7c64
Compare
Hey @lutter, thank you for the information. Here are some notes:
|
## Issue As work takes place to [add support for SQL queries](graphprotocol/graph-node#5382) the Gateway should invalidate such queries in the meantime. ## Solution The Gateway should reject queries containing SQL fields. We want to reuse this code in the `sql-gateway`. Therefore we use the `SqlFieldBehavior` enum to inject the desired behavior. --- Signed-off-by: Joseph Livesey <joseph@semiotic.ai> Co-authored-by: Gustavo Inacio <gustavo@semiotic.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely through reviewing this, but didn't want to hold the review back for even longer.
Besides the comments I made inline, one major missing piece are end-to-end tests. For GraphQL queries we have a pretty comprehensive test suite in store/test-store/tests/core/interfaces.rs
and store/test-store/tests/graphql/query.rs
. I would want a similarly comprehensive test suite for SQL queries - basically if a change to graph-node
breaks SQL queries, I want to find out about that from a failing test similar to the ones for GraphQL.
@@ -30,6 +30,8 @@ git-testament = "0.2.5" | |||
itertools = "0.12.1" | |||
hex = "0.4.3" | |||
pretty_assertions = "1.4.0" | |||
sqlparser = { version = "0.40.0", features = ["visitor"] } | |||
thiserror = "1.0.25" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid importing crates more than once as much as possible (and we're far from perfect on that).
There's two ways in which we do that:
- (the old way) Import the crate in
graph/src/Cargo.toml
and reexport it ingraph/src/lib.rs
. Other crates then sayuse graph::<a crate>
- (the new way) Import the crate in the workspace
Cargo.toml
and add<a crate>.workspace = true
in crates that use it
I just opened a PR #5403 to update sqlparser
to the latest. Please use the workspace one and update this code for it.
|
||
use crate::data::graphql::ext::{ | ||
camel_cased_names, DefinitionExt, DirectiveExt, DocumentExt, ValueExt, | ||
}; | ||
use crate::prelude::s::*; | ||
use crate::prelude::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid *
imports - they become a huge headache later in some refactorings. Just list everything you actually need.
Also, the single letter crates, like s
, q
and r
should always be used as such, i.e., code should always say s::Document
instead of just Document
@@ -414,7 +413,7 @@ impl Resolver for StoreResolver { | |||
return self.lookup_meta(field).await; | |||
} | |||
|
|||
if object_type.is_sql() { | |||
if ENV_VARS.graphql.enable_sql_service && object_type.is_sql() { | |||
return self.handle_sql(field); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ENV_VARS
check shouldn't be necessary - when SQL is turned off, a query that uses it shouldn't even make it here as it would have been rejected during validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason why we have this check here is in the remote case where the schema contains a type called Sql
which a disabled SQL service graph-node would allow it.
name if is_introspection_field(name) => intro_set.push(field)?, | ||
META_FIELD_NAME | "__typename" => meta_items.push(field), | ||
SQL_FIELD_NAME => sql_items.push(field), | ||
_ => data_set.push(field)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment above should say something why sql_items
gets split out here (it's also missing an explanation why meta_items
are split out). In a nutshell:
intro_set
needs to be handled differently because we use a different resolver (IntrospectionResolver
)meta_items
need to be handled separately becauseprefetch
can not handle themsql_items
need to be handled separately for the same reason (?) If so, maybe we can just put them intometa_items
and rename that variable to something more descriptive (maybenoprefetch_items
though I don't love that either)
let result = result.into_iter().map(|q| q.0).collect::<Vec<_>>(); | ||
let row_count = result.len(); | ||
// columns should be available even if there's no data | ||
// diesel doesn't support "dynamic query" so it doesn't return column names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now a DynamicSelectStatement
; PR #5372 uses that, but it requires quite a bit of ceremony. For now, this is fine, but would be great to avoid the whole to_jsonb
dance at a later date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to know. I got to this crate when I was developing but at the time graph-node wasn't using diesel v2. Gonna take a look at it
field accepts values of this type. | ||
|
||
To enable the Subgraph:SQL Service, set the `GRAPH_GRAPHQL_ENABLE_SQL_SERVICE` environment | ||
variable to `true` or `1`. This allows clients to execute SQL queries using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just say to set this to true
environment variable. | ||
|
||
- **Environment Variable:** `GRAPH_GRAPHQL_ENABLE_SQL_SERVICE` | ||
- **Default State:** Off (Disabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set this to true
to turn it on, it would make sense to say that the default is false
permitted to be used within SQL queries executed by the Subgraph:SQL Service, while `POSTGRES_BLACKLISTED_FUNCTIONS` | ||
serves as a safety mechanism to restrict the usage of certain PostgreSQL functions within SQL | ||
queries. These blacklisted functions are deemed inappropriate or potentially harmful to the | ||
system's integrity or performance. Both constants are defined in `store/postgres/src/sql/constants.rs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sorta icky, but it would be nicer for users to just list what's on the whitelist. My understanding is that anything not on the whitelist is forbidden. Mentioning a blacklist here just makes users wonder what happens for functions that are neither on the whitelist or the blacklist
|
||
The GraphQL schema provided by the Subgraph:SQL Service reflects the structure of the SQL queries | ||
it can execute. It does not directly represent tables in a database. Users need to | ||
construct SQL queries compatible with their database schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not explain at all how I go from the GraphQL subgraph schema to the schema against which I can execute queries. If I have an entity type DailyPositionSnapshot
, what's the table I query? Does it matter if that type is mutable, immutable, or a timeseries?
To avoid locking ourselves into how data is currently stored, I would suggest the following:
- the table for an
@entity
type is the name of the type in snakecase - the columns of the table are all the non-derived attributes of the type, also snakecased
- for aggregations, the table name is the name of the type snakecased together with the interval, i.e.
type Stats @aggregation(intervals: ["hour", "day"], source: "Data" { .. }
becomesstats('hour')
in a SQL query - queries are always executed at a specific block, determined by the block constraint on the
sql
element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you (@lutter) have already summed up some part of it. Let's expand on this with examples.
Can you give a subgraph example with aggregation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This subgraph is kinda silly because it just aggregates block numbers, but it explains how to use aggregations in various ways. The aggregation docs also have some more realistic examples.
use super::{DocumentExt, ObjectTypeExt}; | ||
|
||
#[derive(Copy, Clone, Debug)] | ||
pub enum QueryableType<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name QueryableType
much better than ObjectOrInterface
; as I understand it, adding union
here is necessary because of the return type of the sql
query element, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly! we want the user to select which type of response: JSON or CSV. This is also future-proof for possible new formats.
Signed-off-by: Tümay Tuzcu <tumay@semiotic.ai>
Semiotic Labs is thrilled to present the Subgraph SQL Service in partnership with TheGuild and Edge & Node. This exposes a GraphQL type to any graph-node user, providing a secure SQL interface to directly query a subgraph's entities.
SQL Service
We introduce a new sql field on the "Query" type that exposes inputs so graph-node can receive raw SQL.
We also implemented a lot of safe sql rewrite to remove functions that we think may break the postgres instance along with some schema safety.
We also expose each of the tables via Common Table Expressions where we can have granular control over what is exposed or not and some workaround exposing only the latest block data.
What we cover on this PR
sql
query typeunion
outputWhat we do not cover on this PR